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ABSTRACT 

Recent developments in econometrics that are relevant 
to the task of estimating costs in higher education are reviewed. The 
relative effectiveness of alternative statistical procedures for 
estimating costs are also tested. Statistical cost estimation 
involves three basic parts: a model, a data set, and an estimation 
procedure. Actual data are used to assess whether the ridge 
techniques provide a viable alternative to the more familiar ordinary 
least squares approach within the collinear environment 
characteristics of translog models. The translog model that is used 
for the study generates marginal cost estimates for full-time and 
part-time students at two-year colleges. In every comparison 
conducted for the study, the ridge procedure was superior to the 
ordinary least squares approach. Of importance were ridge 
improvements in the precision and stability of estimated 
coefficients, since marginal cost estimates were a function of a set 
of coefficients. In addition, the ridge regression provided a means 
for data and model exploration. Comparing ordinary least squares and 
ridge estimates, and especially by examining ridge traces and 
variance inflation factors, can also promote understanding of the 
effects of multicollinearity (i.e., highly correlated explanatory 
variables) in a given situation. Algorithms and graphs are included. 
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Statistical Cost Estimation In Higher Education: 
Some Al ternatl ves 

The need to understand cost behavior Is a perennial one In higher education. The 
driving force underlying the need may differ from one place to another and It certainly 
changes over time. For Instance, It could be the task of evaluating the cost structure of 
programs at Institutions competing for the same support dol lars, or of ascertaining the 
conditions under which economies of scale might be expected to accompany enrollment 
growth. At the present time, there Is much concern about what may happen to unit costs 
when enrollments decline. Whatever the motivation, the cost analyst is faced with the 
continuing challenge of finding new and, presumably, better ways of understanding costs. 

As the extensive survey done by Adams, Hanklns, and Schroeder (1978) makes quite 
clear, most of what passes for cost analysis in higher education is essential ly a cost 
calculation of one sort or another. The ubiquitous average cost per student credit hour 
is a case In point. This cost figure can be calculated directly, given data on total 
costs and total credit hours. But other costs, such as the cost of an additional credit 
hour, are usually not directly calculable. Instead they must be estimated, using either 
statistical or accounting procedures. The statistical approach Is far more common, and 
constitutes the focal point for this study. Specifically, the intent of this study is 1) 
to review some recent developments in econometrics that are relevant to the task of 
estimating costs in higher education, and 2) to test the relative effectiveness of 
alternative statistical procedures for estimating costs. The material included should be 
useful +o researchers and analysts who have need to estimate higher-education costs, and 
to those who are interested in statistical cost estimation more generally. 

Statistical cost estimation involves three basic parts: a model, a data set, and an 
estimation procedure. For present purposes, data-related issues will be dealt with 
summarily because these issues (for instance, the quality of financial data, the problem 
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of finding acceptable output measures, and so on) are general ly familiar ones for the 
higher-education analyst. By contrast, the thinking among econometric lans regarding the 
structure of cost estimation models has undergone considerable development over the years, 
and may not be as familiar. Similarly, there have been developments In estimation 
procedures that have not received much attention In higher-education circles, but are 
worth considering here. Of course, this paper cannot offer the broad coverage appropriate 
to textbooks. The selection problem Is made easier, though, by the fact that econometric 
thinking on models has converged somewhat, and because some of the preferred models can 
lead to statistical problems which In turn make certain estimation procedures more 
attractive. To put names to these matters, the models In question are trans log cost 
functions, the statistical problem Is mu 1 1 1 col 1 1 near Ity , and the estimation procedure Is 
ridge regression. Actual data will be used In assessing whether the ridge techniques 
provide a viable alternative to the more famll lar ordinary least squares (OLS) approach 
within the col linear environment characteristic of translog models. The translog model 
that Is used for the study generates marginal cost estimates for full- and part-time 
students at two-year col leges. 

Cost Estimation 

The behavior of costs In a particular Industry, or more generally, the production 
structure of an Industry, can be analyzed by estimating either a production function or a 
cost function. The procedures have been shown to be theoretical ly equivalent by Shepard 
(1953), for the single-product firm, and by McFadden (1978), for the multlproduct firm. 
When cost structure Is the primary concern, estimating a cost function Is the most direct 
approach. And, when the Industry In question consists of multlproduct firms, as Is 
certainly the case with respect to higher education, estimating a joint cost function 
offers the distinct advantage of making It relatively easy to model the structure of cost 
without Imposing a priori restrictions on the structure of production — restrictions which 
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are typically imposed when modeling the structure of multiproduct firms by estimating 
transformation (production) functions (Brown, Caves, and Christen sen 1979). 

The implicit form of a cost function can be written as 

where C is total cost, q is output, p Is the input price, and t Is a set of technological 
conditions that may have some effect on the relationship between C and q (McFadden 1978). 
Under theoretically ideal conditions (that is, intent to minimize costs coupled with full 
knowledge of how to do so), the cost function specifies the minimum cost for a given level 
of output. Whether such conditions ever hold entirely is doubtful. It is certainly 
unlikely that they hold for higher education; Bowen (1980) makes this point rather 
emphatically. (Pauly [1978] makes the same point for hospitals.) Most estimated cost 
functions, then, actually represent average rather than minimizing behavior. Conn (1979) 
refers to such cost functions as "approximate." 

Developing an explicit form for tha cost function in a given situation is the essence 
of the modeling problem. Preference for particular types of explicit functional forms has 
changed over the years. Some of the earl lest examples of cost functions date from Dsan's 
studies in the 1930s of retail trade stores (reprinted in Dean 1976). These early efforts 
typically employed simple additive models at best. For instance, Yntema (1940) used the 
function 

C = + <*, %- + (2) 
to estimate marginal costs for the steel industry. 

With the advent of computers in the post WWII era, and the growing Interest in 
econometrics, functional forms gradually became more complex, and, one might say, more 
thoughtful. That is, more attention was paid to the intervening or secondary variables 
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that might influence the cost-output relationship, to the form of the function 
(particularly as it related to economic theory), and to the form of the variables (for 
example, raw versus logarithmically transformed) included In the estimating equation. 
Johnston's 1960 textbook on statistical cost estimation provides an excellent review of 
developments to that point (Including criticisms of the various procedures). A review of 
the early period can also be found in Dean (1976) and Walters (1963). 

Despite the variety of functional forms employed, virtually all cost functions up 
through the early 1970s had one important feature in common. They all imposed a priori 
restrictions on the cost and production structure. For example, in that simplest of forms 
shown above as equation 1, one restriction imposed (apart from consideration of price and 
technical conditions) is that marginal cost, the change in total cost (c) associated with 
an additional unit of output (q), must be constant; It can only be the estimated value of 
the parameter a*, regardless of the value of q— or anything else for that matter. Other 
functional forms were less restrictive, but it was not until the 1970s that so-called 
flexible forms, which impose few If any restrictions, began to be used with some 
frequency. Diewert (1974) reviews several of the flexible forms that are designed for 
joint cost functions (where more than one type of output Is Involved). Griffin (1982) 
compares the approximation characteristics of three flexible forms: the generalized 
Leontief, the translog, and the generalized square-root quadratic. Of these forms, the 
translog function proposed by Christensen, Jorgenson, and Lau (1971; 1973) appears to be 
the most widely adopted (for example, see Brown, Caves, and Christensen [1979], Cowing and 
Holtmann [1983], and Spady [1979]). a variety of discussions and applications of the 
translog cost function can be found in Smith (1982). 

The translog Joint cost function for 4 outputs, m Inputs, and n technical conditions 
can be written 

A 
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+ ^ K, h j /-fj * * 12, IZ «ij ^ f j 




(3) 



where s ^ ,etc. The expression (3) has one neutral scale parameter (a Q ), I + m + n 
first order parameters (ot i ,6 i ,Yj), and (1 + 1) (1/2) + (m+1)(m/2) + (n+1)(n/2) + |m + In + 
mn second order parameters. The only restriction Imposed In using this form Is the 
regularity condition that. If the function Is to be understood as a cost function In the 
strict sense, then the function must exhibit homogeneity of degree one In factor prices 
(that Is, total cost must cnange In the same proportion and direction as a change In 
factor prices). Otherwise there are none of the typical restrictions on the cost and 
production structure of the firms analyzed. Indeed, what were once a priori Impositions 
on structure now become testable hypotheses. The second order logarithmic terms for the 
output variables allow for two Inflection points In the estimated cost curves, thus 
allowing for economies or diseconomies of scale, while the complete set of Interaction 
terms removes any separability assumptions from the model. As Brown, Caves, and 
Chrlstensen (1979) have shown. Imposed restrictions such as homogeneity and separability 
of output can make a significant difference In the results (parameter estimates) of the 
analysis. 

Clearly, then, In the absence of a priori knowledge about the structure of 
production, there Is good reason to adopt a flexible functional form, such as the trans log 
model shown above, that lets the data speak for themselves. At the same time, models such 
as the translog are prone to the estimation problem known as mul tlcol I Inearlty, In which 
correlation among the explanatory variables can hide or distort their true relationship to 
the dependent variables (for example, see Cowing and Holtmann [1983]). Zero-order 
correlations tend to be quite high (r>.95) between a value In logarithms, Its square, and 
related Interaction terms. In estimating marginal costs In higher education, the 
situation may be exacerbated by what one might call "natural", as opposed to 
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"model- Induced" rrul ticol I Inearlty. That is, the key explanatory variables one might 
typically consider In estimating cost functions for colleges and universities tend to be 
col linear. For example, there would be utility In being able to compare the marginal 
costs of lower versus upper division students, but these two enrol Iment levels tend to 
vary together (Allen and Brlnkran 1983). This naturally occurring phenomenon, when 
combined with the tendency of the translog model to generate highly col linear explanatory 
variables, creates a situation in which multicol I inearity is likely to be a serious 
problem. 

Thus, in considering how best to proceed in statistically estimating costs for 
higher-education institutions, the analyst is faced with a dilemma. What has emerged in 
econometrics as the preferred form for the Joint cost function, a highly flexible translog 
model, brings with it the threat of severe multicol I inearity, capable of distorting the 
very results whose integrity is protected by the flexibility of the model. Before 
considering the appropriateness of estimation techniques designed to get around this 
dilemma, we look more closely at the problem of multicol I Inearity itself. 

Multicol I Inearity 

When explanatory variables are highly correlated, regression coefficients estimated 
by applying an ordinary least squares criterion suffer from a number of problems. Thece 
inc I ude 

1) The precision of estimation falls so that it becomes very difficult if not 

Impossible to disentangle the relative influences of the various variables. The 
loss of precision has three aspects: specific estimates may have very large 
errors; these errors may be highly correlated with one another; and the sampling 
variances of the coefficients will be very large. 
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2) Investigators are sometimes led to drop a variable Incorrectly from an analysis 
when Its coefficient Is not significantly different from zero due to 

col I Inearlty, rather than to the absence of a relationship with the dependent 
variable. 

3) Estimates of coefficients are very sensitive to particular sets of sample data; 
the addition or deletion of a few observations can sometimes produce dramatic 
shifts In the coefficients. 

4) Estimates of coefficients are very sensitive to the addition or deletion of a 
variable In the model . 

The multlcol I Inearlty problem Is discussed extensively In the literature of 
econometrics and statistics. These discussions may be roughly divided Into two broad 
categories: (1) those dealing with Its nature and potential consequences (e.g., Blalock, 
1963, 1964; Darlington, 1968; Goldberger, 1964; Gordon, 1968; Johnston, 1972; Kumar, 1975; 
Learner, 1973; WIchers 1975); and (2) those discussing strategies for dealing with the 
problem such as variable selection (Gorman and Toman, 1966; Gunst and Mason, 1977; 
Hocking, 1976), reduction to canonical form (Baranchlk, 1970; Chatterjee and Price, 1977), 
and biased estimation procedures. Techniques discussed under biased estimation procedures 
Include Stein estimators (Mallows, 1973; Mayer and Wilkle, 1973; Sclove, 1968), Bayeslan 
estimators (Learner, 1973; Lindley and Smith, 1972; Thell, 1963), ridge estimators 
(Bulcock, Lee, and Luck, 1977; Darlington, 1979; Dempster, Schatzoff and Wermuth, 1977; 
Hoerl and Kennard, 1970; Marquadt, 1970; Vinod, 1978), and generalized Inverse or 
fractional rank estimation (Hemmerle, 1975; Marquadt, 1970). 

In strict mathematical terms, col I Inearlty Is said to exist If there are one or more 
linear dependencies between predictor variables (Sllvey, 1969). Less restrictive 
definitions (e.g., WN Ian and Watts, 1973) suggest that col I Inearlty exists when linear 
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relationships hold approximately. Farrar and Glauber (1967) define col linearity as a 
statistical rather than a mathematical condition). Viewed from this latter perspective, 
the task becomes one of Identifying the degree of col linearity and Its effects. 

Several Indices are available for describing both the degree of 1 1 1 -conditioning In 
the data and Its effects on estimated coefficients. These Include variance Inflation 
factors (Marquadt, 1970), the mean squared error of the estimated coefficient vector 
(Hoerl and Kennard, 1970), the squared length of the estimated coefficient vector (Hoerl 
and Kennard, 1970), the forecasting error variance (Johnston, 1972), and the ridge trace 
of the standardized regression coefficients (Hoerl and Kennard, 1970). Statistical tests 
for the degree of 1 1 l-cond I tlonlng In the data have also been suggested by Bartlett 
(1950), Farrar and Glauber (1967), Haltovsky (1969), and Wlchers (1975). Chatterjee and 
Price (1977) demonstrate how the method of principle components analysis can be used to 
locate col linear relationships. 

We have found two of the above Indices to be particularly useful: the variance 
Inflation factors (VIFs) suggested by Marquadt (1970), and the ridge trace developed by 
Hoerl and Kennard (1970). The VIFs for a particular model are readily obtained from the 
diagonal of the Inverse of the correlation matrix of the predictor variables, (X'X)' 1 . 
More precisely, we can see from the equation 

V(8) = <r*(M)' r (4) 
that the precision of an estimated regression coefficient Is measured by Its variance 
which Is proportional to <j 2 , the error variance of the regression model. The constant of 
proportionality for a given (j\ Is taken from the l-th term of the principal diagonal of 
(X'X)" 1 . The constant of proportionality Is referred to as the "variance Inflation 
factor" for (Marquadt 1970). 
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It Is easily demonstrated that the VIF for a given Is equal to 1/(1-R 2 ), where R 2 
Is the square of the multiple correlation coefficient from the regression of the l-th 
explanatory variable on all other explanatory variables In the equation. Hence as R? 
tends toward 1.0, Indicating the presence of a linear relationship between the explanatory 
variables, the VIF for tends to Infinity as does the associated variance estimate. The 
estimated variance for any specific coefficient may then be written as 

It has been suggested that values for VIFs greater than 10.0 are Indications that 
multlcol I Inearlty may be causing estimation problems (Chatterjee and Price, 1977; Marquadt 
and Snee, 1975). A VIF of 10.0 for a particular explanatory variable X.j, Implies a 
multiple correlation of .95, when Is regressed on the other explanatory variables In 
the model . 

The second method for detecting multlcol I Inearlty, the ridge trace method, flows 
directly out of the ridge analysis which will be employed In this study as an alternative 
to the 0LS procedure. The ridge trace method will be discussed and demonstrated In the 
sections that follow. 

RIdae Estimators 

As previously noted, the particular class of biased estimators employed In the 
present study are the ridge estimators first proposed by Hoerl and Kennard (1970). Ridge 
estimators were chosen for three reasons. First, they are designed to be more reliable 
than the least squares estimator In the presence of an Ill-conditioned data matrix. 
Second, the "ridge trace 11 conveys both the degree of 1 1 l-condl tlonl ng, and the Imprecision 
Inherent In Interpreting col linear data. Third, ridge-type solutions provide estimates 
under varying sample and col I Inearlty conditions which appear to be at least as good If 
not better than available alternatives (cf., Dempster, Schatzoff and Wermuth, 1977; Hoerl, 
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Kennard and Baldwin, 1975; McDonald and Galarneau, 1975; Vlnod, 1978). With this In mind, 
we will proceed with the development of the general class of rldge-type estimators* 

The least squares estimate of B may be written In a more generalized formula as 

,.Sn) - fx'* > hiY'K'y (6) 

where M k" Is a scalar. In the least squares estimator, k=0 so the above equation reduces 
to the faml I lar 

3 ... (*•*)-• x'y m 

When k>0,^( : k) Is a "biased 11 estimator of the true unknown coefficient vector. However, 
It can be shown that by allowing a little bias Into the system, one obtains an estimator 
with a smaller total mean squared error value than by using OLS procedures. This may be 
stated analytically as 

£C(Z<j)-a)'(£u)-a)J' c - £[(*■*)' (B-&) J 

Using ridge estimation, then, entails adopting minimum mean square error (MSE) as a 
general criterion In place of the customary ordinary least squares criterion. 

Procedures for Selecting k 

Choosing a value for k Is critical In using a ridge estimator. Hoerl and Kennard 1 s 
1970 article Introducing ridge regression to the scientific community suggests that 
guidelines for selecting a particular value for k are straightforward: 

1. At a certain value of k, the system will stabilize and have the general character 
of an orthogonal system. 
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2. Coefficients wll I not have unreasonable absolute values with respect to factors 
for which they represent rates of change. 

3. Coefficients with improper signs at k=0 will have changed to have proper signs. 

4. The residual sum of squares will not have been inflated to an unreasonable value. 
It will not be large relative to the minimum residual sum of squares or large 
relative to what would be a reasonable variance for the process generating the 
data. (p. 65) 

Unfortunately, the guidelines are no mors ther, general signposts. In reality, the 
optimal value of k cannot be determined with certainty (i.e., in terms of -j closed-form 
solution) because It depends on the unknown parameter vector B and the unknown error 
variance ol In practice, k must be determined subjectively or estimated from the data 
(Mayer and Wilkie, 1973; Judge et al 1980). This characteristic of k is at the root of 
the difference of opinion regarding the value of ridge estimation. Some would argue that 
the reduction in mean square error gained by the introduction of bias into the system has 
little value because of our Inability to select the amount of bias in an optimal manner. 
To put it another way, we cannot evaluate any gain in accuracy for a particular problem 
without knowing the true values of the coefficients. Proponents of ridge techniques 
counter by claiming that one can use the data in a particular problem to help select a 
value of k that will produce an estimator superior to OLS. The rejoinder to that argument 
is that the resulting estimates are stochastic, while OLS estimators and ridge estimators 
based on a fixed k aro nonstochast Ic. Thus It Is argued that selecting a value for k 
based on sample data makes It Improper to apply standard statistical tests (such as 
t-scores) in the ridge environment (Darlington 1978; Judge et al 1980). We will return to 
this important problem later. 
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A number of specific techniques have been developed for estimating the value of k. 
Each has its proponents. In the final analysis, the choice of technique depends on the 
assumptions the investigator is wll ling to make. Two techniques for estimating k were 
Incorporated in this study: ridge trace (Hoerl and Kennard, 1970); and, the harmonic mean 
(Hosrl, Kennard, and Baldwin, 1975). 

The ridge trace approach was used because it provides a description of (1) the 
severity of i i l-conditioning in the data; (2) how col linearity conditions affect 
estimation; and (3) how increasing the degree of bias introduced into the regression model 
affects coefficient estimation. The approach entails introducing a specific amount of 
bias into the model and plotting the resultant biased coefficients against the bias value. 
The primary drawback of the ridge trace approach is that it does not provide a point 
estimate (i.e., a closed form solution) for k. The technique requires the analyst to 
visually examine the plot and make a subjective decision about where (i.e., at what value 
of k) the solution appears to stabilize. Because the technique makes no assumptions about 
the nature of the closed form solution but allows the analyst to plot the consequences of 
introducing all feasible bias values, ridge trace plots can simultaneously provide and 
depict the relationship between ail feasible closed form solutions. The ridge trace 
procedure is formally developed in Appendix III. 

The harmonic mean approach was used because (1) it provides a relatively simple 
procedure for calculating the bias parameter, i.e., 

-A = P* 7 /& & (8) 

(2) its assumptions are simple and relatively easy to understand making the procedure 
readily employable by the lay analyst; and (3) the procedure provides estimated values for 
k which appear to have optimal properties under varying conditions of col I inearity. The 
rationale for the approach is presented in Appendix IV. 
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Ridge vs. QL S — Mode I for Testing 

The results of numerous studies are available to the reader Interested In theoretical 
comparisons of the merits of OLS versus ridge estimation procedures (Dempster, Schatzoff 
and Wermuth, 1977). Our concern here Is not with attempts at proving that ridge 
estimators are always to be preferred when predictor variables are highly correlated. 
Arguments by Judge et al (1980) and many others clearly demonstrate that ridge estimators 
have many undesirable properties and, furthermore, lack many of the desirable properties 
claimed for them. Our Intent here Is to show how ridge estimators can be useful In a 
practical context— for providing both insights Into the effects of mul tlcol I Inearity and a 
viable means of mitigating some of those effects. Hence, rather than set up an artificial 
data set to use as a basis for comparing the results of OLS versus ridge estimates, actual 
data relating to a typical cost estimation problem In higher education are used In what 
follows. The former approach has the advantage of permitting knowledge of the true 
coefficients and true variances, around which comparisons could be made. It seemed more 
Important, however, to show what working with ridge estimators is like under the normal 
condition of uncertainty. 

The cost estimation problem to be used for testing purposes has been reported on 
earlier (Brinkman 1983). In that study, marginal costs for full-time and part-time 
students were estimated and compared for several standard expenditure categories at public 
two-year colleges. A transiog Joint cost function was developed and subsequently 
estimated by a ridge technique. For present purposes, the same model and data set will be 
used to compare ridge and OLS results. Initially, then, the testing procedure will In 
effect be looking behind the scenes to show what, If anything, was gained by using ridge 
regression rather than OLS. Additional comparisons wii I be made between OLS and ridge (at 
two different values of k) using progressively smaller samples (randomly chosen subsets of 
the original full sample), multiple samples of the same (small) size, sets of coefficients 
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derived from one sample to estimate marginal costs for another sample, and the 
reestlmatl on of a sample following the removal of outlier cases. Since the true values of 
the parameters are not known, the comparisons can only be suggestive, and not definitive. 
Nonetheless, by observing the results of working with real data, the potential user of the 
ridge procedures may gain some useful Insights as to their practical utility* 

The cost function used to estimate marginal costs at two-year col leges contains the 
following variables. The dependent variable Is total Instructional expenditures (In the 
original study, expenditures for student services and for educational and general purposes 
were also analyzed). The Independent variables Include: as outputs, the number of 
full-time students (FTS), the number of part-time students (PTS) , and the number of 
non-credit students (NCS); as Input price, the salaries paid to full-time faculty (SAL); 
and as technological conditions, the proportion of degree earners (DEG), the proportion of 
relatively high-cost programs (HCP), and the system-status of the campus (CSS). The last 
variable listed was In dummy form (1 or 0 depending on whether the institution had 
Independent rus or was part of a system), and was not Interacted. The single price 
variable was not Interacted either, In the absence of any substitution possibilities, but 
was in logarithmic form. All other variables were logged, squared, and Interacted In 
standard trans log form. 

The data are taken from the 1979-80 Higher Education General Information Surveys, 
except for the data on non-credit enrol Iments which came from the American Association of 
Community and Junior Colleges' directory, me full sample consisted of all Institutions 
that had complete data and were not a branch campus, except for a handful of outliers 
which were removed from the sample. The full sample consisted of 779 Institutions, or 
about 75 percent of all public two-year colleges In 1979-80. 
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Results 

Table 1 shows the estimated coefficients for the full sample using OLS and two ridge 
estimates. In looking at the coefficients (B) and their standard errors (SE) as estimated 
by OLS, one finds remarkably little outward evidence of col I Inear Ity. Roughly half of the 
coefficients are statistically significant (B/SE > 1.96, as measured using OLS estimates), 
and most of their signs are plausible. One unexpected result Is the sign on the estimated 
coefficient for FTS. Since what we know about the costs of Instruction suggests that the 
number of full-time students Is usually the most Important single determinant of total 
costs, It Is surprising that the coefficient on full-time students (FTS) should be 
negative and statistically Insignificant, Instead of being positive and significant. 

Despite what Is suggested by the OLS coefficients, however, the variables are In fact 
highly col I Inear. The var lance- 1 nf I atlon-f actors (VIF) make this quite clear. Since a 
VIF of 1 Is equivalent to orthogonality, !t Is clear that only a couple of variables, SAL 
and CSS, are relatively free of col I Inear Ity. The high VIFs on the remaining variables 
Indicate a high degree of Imprecision In the estimated coefficients. 

The zero-order correlations among the explanatory variables may lend Insight Into 
sources of the col linearity problem. Table 2 shows the correlations for a subset of the 
variables In the model. The table clearly shows the "mode I- Induced" col linearity 
discussed earlier. Some variables have more than a .99 correlation with their squares, 
and some Interaction terms have well over a .90 correlation with one or both of the 
Interacted terms. By contrast, the "natural" col linearity among the variables shown only 
runs as high as .688 (FTS with FTS), and Is usual ly much less than that. Of course, 
zero-order correlations typically will understate the degree of col linearity In the 
system, as they reveal nothing of the col 1 1 near Ity which Is due to combinations of 
variables. The VIFs do reflect the latter source of col 1 1 near Ity, however, and thus 
provide better Insight Into the extent and location of mul tlcol I Inear Ity In a given model. 
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Table 1 



Regression Results Using Alternative Estimators 
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52.03 


.640 


.062 


1.94 


K=.200 


.008 


.000 


.11 


.617 


.043 


.94 






NCS 






(NCS) 2 




OLS 


-.028 


.036 


196.17 


.008 


.001 


15.24 


K=.003 


-.038 


.019 


55.83 


.007 


.001 


13.39 


K=.200 


-.007 


.001 


.19 


.002 


.000 


.74 












(PEG) 2 


100.61 


OLS 


-.985 


.304 


205.84 


.136 


.033 


K=.003 


-.550 


.144 


46.49 


.108 


.019 


33.75 


K=.200 


-.089 


.010 


.20 


.001 


.002 


.31 






MCE 






(HCP)2 




OLS 


-.229 


.114 


143.76 


.023 


.007 


11 .77 


K=.003 


-.218 


.066 


47.52 


.022 


.006 


10.35 


K=.200 


-.013 


.005 


.25 


.007 


.002 


.80 






(FTS) (NCS) 




(FTS) (DEG) 




OLS 


-.008 


.005 


194.75 


.056 


.038 


243 .48 


K=.003 


-.003 


.003 


55.57 


.007 


.015 


37.16 


K=.200 


.001 


.000 


.21 


.029 


.001 


.32 






(FTSMHCP) 




(PTS.) (NCS) 




OLS 


.004 


,016 


153 .93 


.002 


.003 


83.49 


K=.003 


.022 


.009 


46.53 


-.001 


.002 


34 .70 


K=.200 


.011 


.001 


.27 


-.000 


.000 


.43 






(PTSHDEG) 




(PTS.MHCP) 




OLS 


-.035 


.023 


180.15 


.013 


.011 


88.49 


K=.003 


-.021 


.011 


40.99 


-.004 


.007 


38.28 


K=.200 


-.001 


.001 


.30 


-.000 


.001 


.26 
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(NCSHDEG) (NCS>.(HCPJ 

OLS -.002 .007 93.98 .007 .003 26.13 

K=.003 -.001 .005 39.17 .005 .003 19.65 

K=.200 -.001 .000 .33 -.000 .001 .63 

(P.EG.HHCP?. CSi 

OLS .012 .023 80.61 .005 .019 1.03 

K=.003 .011 .015 34.68 .007 .019 1.02 

K=.200 -.004 .002 .36 .018 .016 .70 
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Table 2 

Zero-Order Correlations for a Subset of Variables 





IC 


m 


(FTS) 2 


ELS. 


,<PTS>2 


(FTSMPTS) 




(FTSHHCP) 


TC 


1 .000 
















FTS 


.924 


1.000 














(FTS)2 


.927 


.997 


1.000 












FTS 


.744 


.688 


.694 


1 .000 










(PTS)2 


.781 


.723 


.732 


.990 


1.000 








(FTSHPTS) 


.885 


.880 


.887 


.945 


.961 


1.000 






HCP 


.109 


.070 


.054 


-.068 


-.064 


-.023 


1 .000 




(FTSHHCP) 


.458 


.449 


.434 


.205 


.221 


.318 


.914 


1 .000 
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Just how effective the VIFs are In pinpointing multlcol I Inearlty Is a matter of 
controversy (see Judge et a I 1980 for a discussion). 



Table 1 also shows effects of Introducing bias Into the estimating system by means of 
the ridge procedure. Notice first the rapid reduction In VIFs for variables with extreme 
VIF values. Also note that the regression coefficients and their standard errors also 
change, depending on the amount of bias (the value assigned to k) Introduced. We will 
discuss below the Issues surrounding the choice of the amount of bias. For now, It Is 
enough to note that K-.003 Is a relatively small amount and K=.2 Is a relatively large 
amount of bias for this particular situation, and that both values are plausible choices. 

As shown In table 1, even at K=.003, the sign on FTS has switched from negative to 
positive, and the standard error has become small relative to the estimated coefficient. 
Additional bias does not change these desirable new features. Not all changes Induced by 
the bias are as welcome. According to the OLS estimate, the sign on (FTSXPTS) Is 
negative. This result Is certainly theoretically acceptable, as It Indicates the 
existence of economies of scope: It is less expensive to instruct full-time and part-time 
students together, I.e., at the same Institutions, than +o do so separately. 
Unfortunately, one might say, the Introduction of Increasing amounts of bias Into the 
estimating procedure eventually leads to a sign switch on (FTSHPTS). Since this opposite 
result is also theoretically plausible, we are left with no substantive basis for. arguing 
on behalf of either result. In other words, In the absence of a clear theoretical 
direction, It Is difficult to feel comfortable with a sign change (especially when the 
standard errors are relatively small In both cases). Particular trouble with this 
variable might have been expected as it had the highest VIF of any variable In the model. 
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The ridge trace procedure allows us to "see" the effects of Introducing bias Into the 
system. Figure 1 shows the trace (the value of the standardized coefficient) as a 
function of the value of k for each of several variables. Figures 2 and 3 show the traces 
for additional variables In the model. The traces of the coefficients with high VIFs are 
much more sensitive to the amount of bias In the system. The reason why the ridge 
procedure Is attractive, when col linearity Is a problem. Is the way In which It stabilizes 
or "tames" (Kennedy 1979) badly behaving coefficients. In other words, with enough bias, 
the coefficients of highly col linear variables can be made to behave as consistently as 
the coefficients of non-col 1 1 near variables. In part this Is accomplished by reducing the 
absolute magnitude of the coefficients with respect to their OLS values. Interestingly, 
variables that are col linear and of little consequence In model have their coefficients 
reduced In magnitude to such an extent that they are, In effect, removed from the model. 

While the behavior of a particular coefficient Is of some Interest, marginal cost 
estimates in the present context are the result of a combination of coefficients. 
Specifically, the marginal cost of an output q Is equal to the first partial derivative of 
the estimated cost function with respect to q, multiplied by the estimated value of total 
cost for a particular value of q, divided by that value of q, or 

^ = y £/ r (g) 
where ^ * 

In the present case, where the outputs of concern are full-time (FTS) and part-time 
(PTS) enrollments, the respective marginal cost calculations are as follows: 

M<^ = (<x fl t FTS t ^FTS + OyDfffr -h cL H l4tr 1- cc^NdS). 1/pTS (10) 

IAC P =(($ ^Q^Prs f t^prs <- ^te & * \^ HtP + b- u Niz)> c/ prs (11) 
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Ridge Trace 



FTS 



PTS 



(FTS)(PTS) 



(FTS)(FTS) 



(PTS) (PTS) 



Standardized Beta 
1.41 f 




0.000 .001 .002 .003 .004 .003 .OOfl .007 .008 .009 .010 .011 .012 .013 .014 .018 



K- value 

F!g.1. Ridge trace for key output variables, full sample 
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Ridge Trace 



SAL DEG HOP CSS NCS 



Standardized Beta 



1.2 
1.0 

.8 

.8 

.4 

.2 

0 

-.2 

-.4 

-.6 

-.8 
-1.0 
-1.2 h 



-1.4 



I I I I I 



0.000 .001 .002 .003 .004 .005 .008 .007 .008 .009 .010 .011 .012 .013 .014 .018 

K-value 

Flg.2. Ridge tree© for subset of control variables, full sample 



Ridge Trace 



(FTS)(DEG) (PTS)(DEG) (FTS)(HCP) (PTS)(HCP) 



Standardized Beta 

■ »4 



T 1 r~T~T 



1.2 
1.0 

.a 

.e 

.4 
.2 

0 

-.2 
-.4 
-.6 
-.8 
1.0 
1.2 h 
1.4 



'- I 1 ; 1 1 ' 1 ' 



0.000 .001 .002 .003 .004 .005 .006 .007 .008 .009 .010 .011 .012 .013 .014 .015 

K-value 

FIg.3, Ridge trace for subset of Interaction variables, full sample 



With re&poct to tho concerns being addrossod In this paper, It Is porhaps worth 
noting tho obvious; tho accuracy of a marginal cost estimate for tho modol at hand will 
depend on the accuracy of six estimated coefficients. Tablo 3 shows the rosults of using 
these formulas with the threo sets of coefficients shown In table 1. The calculations 
have been done for four levels of output In order to contrast the respective estimated 
costs across the observed range of enrollment. The category "small InstI tut Ions 11 refers 
to Institutions lying within the smallest five percent of those In the sample (as measured 
by enrollment). Data on 10 such Institutions, randomly chosen, were averaged to create a 
data set for a "typical" small Inst I tut Ion— 284 full-time and 221 part-time students. In 
a similar fashion, data for a typical large Institution were created— 4,665 full-time and 
12,885 part-time students. Between these extremes, two types of middle-range Institutions 
are also represented In table 3. Section C shows the results of uslnp raw enrol Iment 
means for the entire sample— 1645 full-time and 2840 part-time studen —to represent one 
such Institution. Section B shows the results of using the logarltt nrollment means 
for the entire sample~1150 full-time and 1366 part-time students — to represent the other. 
The raw data distributions are positively skewed, so the means of the logarithmic data are 
smaller. Fully two-thirds of all the Institutions In the sample have enrollments equal to 
or less than the raw mean values. 

In order to evaluate marginal costs at these various enrol Iment levels, values for 
the other independent variables In the model must also be selected. For the results shown 
In table 3, the following conditions were Imposed: the raw mean values for percent of 
degree completion (29?) and percent of high cost programs (36.2?) were used In all 
sections; with respect to noncredlt enrollment, the average of actual values was used for 
section A (165 students); the log mean value for section B (354 students), and the raw 
mean value for sections C and D (4335), and for faculty salaries the log mean value was 
used for section B ($18,215), the raw mean value for section C ($18,578), and the average 
of actual values for sections A ($13,625) and D ($23,090). Neither degree completion nor 
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Table 3 

Marginal Costs of Instruction 
Alternative Estimates 
Ful I Sample 



Institutional Size Ridge Ridge 

and Student Type Oil K=.QQ3. 1^2. 

A. Smal I 

FT $1057 $1335 $1436 

PT $ 349 $ 245 $ 349 

FT/PT 3.03 5.45 4.11 

B. Middle (Log Means) 

FT $1494 $1500 $1455 

FT $ 290 $ 265 $ 258 

FT/PT 5.15 5.65 5.64 

C. Middle (Raw Means) 

FT $1431 $1542 $1575 

PT $ 223 $ 208 $ 198 

FT/PT 6.42 7.41 7.95 

D. Large 

FT $1871 $1941 $1809 

PT $ 179 $ 194 $ 151 

FT/PT 10.45 9.94 11.98 
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program emphasis were correlated with full- or part-time enrollment levels, and thus both 
could be left at their respective mean values. Noncredit enrol Iment to some extent and 
salaries to a considerable extent were correlated with full- and part-time enrollment 
levels; thus values other than those at the mean were required to adequately represent 
typical combinations of Institutional characteristics across the (credit) enrollment 
spectrum. 

In terms of the underlying management finance Issues—especially tuition levels and 
appropriations or allocations per FTE~both the absolute value of the marginal costs for 
full- and part-time students, and the ratio between the two costs, are Important. As 
table 3 shows, there are differences In the results by size of Institution and by type of 
estimating procedure. The only result shown that appears somewhat Implausible Is the OLS 
estimate for full-time students at small Institutions. On theoretical grounds we would 
expect that the cost curve, partial ly depicted by the four "points 11 shown In table 3, 
would be U-shaped, and there Is evidence to that effect (Brlnkman 1981). For very smal I 
Institutions, estimated marginal costs would escalate rapidly according to the two ridge 
procedures, but would continue to decline according to OLS (not tabled). In part, the 
reason for the OLS result Is the negative coefficient on FTS which was mentioned earlier. 
It could be argued, then, that the ridge technique '■corrects 11 the sign on that coefficient 
and thereby produces a better estimate of marginal costs, particularly for the smaller 
Institutions in the sample. (For those readers perplexed by the ability of small 
Institutions to have lower marginal costs than the mid-sized Institutions, as shown in 
table 3, we note that the primary reason Is lower faculty salaries at the small 
Institutions. If the small Institutions paid their faculty at the [raw] mean rate for the 
sample C$18,215], instead of $13,625, they would in fact have higher marginal costs than 
the mid-sized institutions. See Brinkman [1983] for more details.) 
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It Is also useful to look at some statistics for the system as a whole, In order to 
see what else happens beyond changes In estimated coefficients as bias Is Introduced. As 
shown In table 4, one result Is a decrease In R 2 , the conventional measure of "goodness of 
fit." Of course, R 2 must decrease since by definition OLS wil I always give the best fit 
measured In that way. Similarly, the Increase In sums of squares (SSQ) and In the 
standard error (SE) are to be expected. Notice that the changes are quite small, which 
means that we can perturb this system quite considerably without losing much predictive 
power. In any case, the tradeoff Is that the mean square error (MSE) is drastically 
reduced as bias Is Introduced. As Judge et a I (1980) emphasizes, the reduction in MSE 
cannot be guarat; 1 • ^j, but obviously there Is no question that the reduction Is both real 
and substantial for the particular model estimated In this study. 

Another way of comparing the OLS and ridge estimators Is to look at their respective 
results for varying sample sizes. As noted earlier, an abundance of data (I.e., 
observations) Is usually a good antidote to multlcol I Inear Ity. The problem, of course, Is 
that numerous cases are not always available, or their acquisition may be expensive, and 
so on. Thus the analyst may often have to make do with a relatively smal I sample, and to 
look elsewhere, such as to an alternative estimator, for help In handling 
multlcol I inear ity. 

Table 5 shows the marginal-cost results, at mean (log) values of the variables, of 
estimating the trans log cost function for a series of decreasing sample sizes, starting 
with the full sample (N=779). The smallest of the randomly drawn subsamples, N=50, 
retains 27 degrees of freedom. Overall, the least amount of difference among the various 
samples, as measured by the range of values, Is found in the estimates generated by the 
ridge programs when k-.2. This Is particularly true for the cost estimates for full-time 
students. The ridge program at k=.2 also does the best job for sample size 50a, as the 
alternative procedures, OLS and ridge at k s .Q03, lead to implausible results in the form 
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Overall Sta+ls+lcs for the 

OLS .921 

Ridge (k=.003) .919 

Ridge (k=.200) .903 



Table 4 

Cost Function Model Using Alternative Estimators 





SSL 


N1S£ 


MAX 
U£ 


.081 


.285 


333.0 


680,4 


.083 


.288 


129.9 


55.8 


.100 


.316 


7.3 


.9 
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Table 3 

Marginal Cost Estimates Using OLS and Ridge Estimators 
on Different Sample Sues 

Number of 



Cases. 




OLS 






RIPGE 


(..0,03) 




RIDGE 


..2) 




EC 


a 


DXHL 


£1 


EE 


FT/PT 


EE 


EL 


FT/PT 


779 


$1494 


$290 


j. 15 


$1500 


$265 


5.65 


$1455 


$258 


5.64 


_> ^ <J 


$1387 


$246 


5.64 


$1438 


$209 


6.88 


$1435 


$235 


6.11 


176(a) 


$1509 


$276 


5.47 


4* 1 J J J 


$223 






£761 


5 Ifi 


176(b) 


$1218 


$323 


3.77 


$1261 


$297 


4.25 


$1338 


$259 


5.17 


100 


$1390 


$359 


3.87 


$1571 


$300 


5.24 


$1433 


$247 


5.80 


50(a) 


$1716 


$-64 




$1605 


$-56 




$1385 


$ 91 


15.22 


50(b) 


$1431 


$266 


5.38 


$1317 


$236 


5.58 


$1393 


$183 


7.61 
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of negative marginal costs for part-time students. However, while the estimate of 
part-time marginal costs is positive at k*.2, the estimated value of $91 is only about a 
third of the value estimated on the basis of the full sample; nonetheless, the estimate is 
certainly less misleading than those derived from OLS or ridge at k=,003. The same two 
bias parameter values, ,003 and .2, were used for each sample simply for ease of 
exposition. Normally, the selection of a value for k would be sample specific, as will be 
discussed below. 

Given the amount of col linearity in the system, we might expect considerable 
variability in the coefficients, and thus the marginal cost estimates, from one randomly 
drawn small sample to another. Using table 5 again, we see that the marginal cost 
estimates derived from the first sample of 50 institutions (a) differ considerably from a 
second sample of the same size (b). The estimates based on OLS and ridge at k=.0O3 vary 
more than those based on ridge at k=.200; at the same time the two former estimates gave 
results for part-time students and for FT:PT that are closest to those derived from using 
the full sample. The picture is again somewhat mixed for the two samples (a and b) at 
N-176, although the cross-sample stability of the ridge estimates at k^.200 is remarkable. 
Also, even though OLS and ridge at k=.O03 give good results for some of the small samples, 
they do so using negative coefficients on FTS (not tabled), which suggests that those 
estimators might yield Implausible results for very small institutions—as opposed to 
institutions with mean or larger enrol Iments. 

High col linearity tends to make regression coefficients highly sensitive to the 
inclusion in the sample of particular cases, especially outliers. Thus another way in 
which OLS and ridge estimators can be compared is their respective reaction to the removal 
of cases from the sample. The more stability, that is, the less the change In the 
coefficients, the better. 
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Table 6 shows the results of removing two cases from the sample N-100. The two cases 
removed were outliers In the sense that their predicted total costs were furthest (about 2 
standard derivations) from their actual total costs among the Institutions In the 
subsample. Table 6 shows two kinds of comparisons. Panel A contains marginal cost 
results, while Panel B contains the estimated coefficients for a subset of the variables 
In the model (those that are directly Involved In the calculation of the marginal effects 
of FT and PT). As can be seen from the percentage change calculations, the ridge 
procedure at k s .200 provides considerably more stability than the OLS procedure, with 
respect to both the estimated marginal costs and the underlying regression coefficients* 
The ridge procedure at k s ,003 general ly yields more stable coefficients than OLS, but not 
In al I Instances* 

Variability among subsamples can be examined In yet another way. The estimated 
coefficients from one sample can be used with the values of the variables from a second 
sample to yield predicted total costs for the second sample* These predictions can then 
be correlated with actual total costs across the second sample, with the degree of 
correlation expressed as the familiar R 2 . The question for present purposes Is whether 
coefficients estimated by OLS will do better or worse than those estimated by ridge— with 
respect to the amount that R 2 wll I shrink when the original coefficients are used with a 
new sample. Results of such a comparison for two randomly drawn subsamples (N=1 00) are 
shown In table 7* All three estimators yield high R 2 values for the original sample, with 
tho bias-related decrease In R 2 again being evident (as In table 4). The 
cross-verlf Icatlon procedure (Daniel and Wood, 1980) shows much less shrinkage In R 2 for 
the ridge estimators. Ridge at k s *2 Is especially reslstent to shrinkage In this 
I nstance. 

A sample size of 100 was chosen for this test because the ratio between the number of 
cases (N) and the number of variables In the model (P) was not extreme In either 
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Table 6 

Effects of Removing Two Cases from a Smal I Sample (N=100) 

QL§ Ridge (K=.005) Rlrine (K=.200) 

* * * 

N=10Q N=98 Change N=10Q N=98 Change fcfclflfl N=98 .Ch an flS 



A. Marginal Cost Estimates 



FT 


$1390 


$1296 


6.8* 


$1571 


$1542 


1 .8* 


$1434 


$1489 


3.8* 


PT 


$ 359 


$ 319 


11.1* 


$ 300 


$ 274 


8.7* 


$ 247 


$ 252 


2.0* 


Ratio 


3.87 


4.06 


4.9* 


5.24 


5.63 


7.4* 


5.81 


5.91 


1.7* 


B. Unstandard Ized Regression 


Coeff Id 


lents 












FT 


-1.188 


-1 .469 


23.7$ 


-.003 


-.029 


866.7* 


.241 


.238 


1.2* 


(FT)2 


.097 


.095 


2.1 


.045 


.044 


2.2 


.018 


.018 


0.0 


PT 


.368 


.340 


7.6 


.136 


.135 


0.7 


.024 


.022 


8.3 


(PT)2 


.008 


.003 


62.5 


.010 


.008 


20.0 


.003 


.003 


0.0 


(FTxPT) 


-.019 


-.003 


84.2 


-.006 


-.004 


XX X 


• UU/ 


• UU/ 


n n 


(FTxDEG) 


-.128 


-.202 


57.8 


-.046 


-.059 


28.3 


.027 


.026 


3.7 


(FTxHCP) 


.304 


.417 


37.2 


.088 


.103 


17.0 


.019 


.020 


5.3 


(FTxNCS) 


-.007 


-.007 


0.0 


-.003 


-.003 


0.0 


.001 


.001 


0.0 


(PTxDEG) 


.004 


.021 


425.0 


-.020 


-.019 


5.0 


.000 


.000 


0.0 


(PTxHCP) 


-.058 


-.077 


32.8 


-.013 


-.013 


0.0 


.002 


.002 


0.0 


(PTxNCS) 


.003 


.000 


87.3 


.003 


.003 


0.0 


.000 


.000 


0.0 
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Table 7 

Comparison of R 2 Shrinkage* for Three Estimators 

B. 2 Values 

SL£ Rldae (K S ..Q.Q3> Ridge (K=.2? 

Original Sample .944 .938 .920 

(N=100) 

Second Sample .842 .900 .915 

(N=100) 

Change In R 2 .102 .038 .005 

* When regression coefficients estimated on the basis of an original sample 
are used to predict total costs for Institutions In a second sample. 
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direction. The superiority of the ridge procedure with respect to R 2 shrinkage has been 
shown to be directly related to the magnitude of the number of predictors to the number of 
cases (P/N) (Faden 1978). While not extreme, the ratio In this Instance, 22/100, Is 
actually fairly low compared to some that are reported In recent literature. For example, 
In Cowing and Holtmann (1983), the ratio Is 107/138, while In Brown, Caves, and 
Chrlstensen (1979) It Is 21/67. It appears, then, that there Is some likelihood of 
encountering situations where the ridge procedure could be helpful, assuming, of course, 
that maintenance of predictive power In a cross-validation sense has value. 

The Bias Parameter Revisited 

In the previous section, It was shown that at least In the particular situation being 
analyzed In this study, the ridge estimators offered some advantages over the conventional 
least squares approach. The ridge estimators provided theoretically better estimates for 
marginal costs at small Institutions based on a large sample size, more plausible 
estimates when relatively small samples were used, less shrinkage In R 2 when coefficients 
were used across samples, and more stable estimates when cases were removed from small 
samples. But the ridge procedure does not provide a single alternative to OLS. Rather, 
the procedure can generate a virtually unlimited number of alternatives (I.e., sets of 
estimated coefficients), with each alternative being a function of the value assigned to 
the bias parameter k. Unfortunately, as was pointed out earlier, the selection of a value 
for k Is anything but straightforward. Which Is not to say that there are not 
straightforward procedures, but rather that there are alternative procedures which lead to 
different values of k and no one procedure Is acceptable to experts In the field. 

In Illustrating the capabilities of the ridge procedure In the previous section, 
results (coefficients and marginal cost estimates) were shown for assigned k values of 
.003 and .2. These values were chosen because they ranged from relatively small to 
relatively large amounts of bias, and because they had Intuitive appeal to the authors. 
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That Is, they led to results (a change In signs or In the magnitude of coefficients) which 
made sense. But what of the more rigorous methods suggested to assign a value to k? What 
k values do these methods suygest for the model and data used In the present study? 

We start by returning to the ridge trace procedure, which was developed early on by 
the originators of the ridge estimator (Hoerl and Kennard 1970). One can see In figure 1 
above why an analyst might elect to pick a value for k somewhere between .003 and .009. 
Relatively little bias seems to accomplish a great deal In terms of stabilizing the 
coefficients. As an alternative, using figure 4 which displays the trace over a greater 
range of k, consider selecting a value for k of .200, which was used for some of the 
estimates discussed above, or a value of .360, which Is the value one obtains on the basis 
of the harmonic mean technique (as derived In Appendix III). Figure 4 shows that little 
is gained In terms of stable behavior by using a value greater than .2. For that matter 
very little additional stability Is gained by using a value greater than .05. Are there 
advantages In using the least amount of bias that gains the minimum acceptable stjblllty? 
Perhaps so, at least on an Intuitive level. That Is, for the analyst who cons I H * - the 
Introduction of bias as at best a necessary evil to combat mul tlcol I Inear Ity, \. _r<. nay be 
some utility In staying as close to the OLS solution as possible, although this position 
Is not Justified In the literature. One seeming advantage of using k=.003 for the full 
sample (N=779), for example, Is that all the "t-scores" save that on FTS are of the same 
sign and order of magnitude as the t-scores for OLS. Strictly speaking, B/SE cannot be 
treated as a t-score In ridge (k>0), that Is, a given value of the ratio cannot be 
assigned a level of significance, because the sampling distribution for the statistic Is 
unknown when k Is determined from the data (Obenchaln, 1977). Practically speaking, 
however, the analyst might still be willing to use the statistic In Interpreting how well 
the respective Variables were performing If the amount of bias were very small. 
Furthermore, the smaller the bias the less the Increase In the residual sums of squares 
for the model as a whole (see table 4). 

35 



37 



Ridge Trace 



FTS 



PTS (FTS)(PTS) (FTS)(FTS) (PTS)(PTS) 



Standardized Beta 

i 1 r 



-.6 
-.8 - 



K-vclue 

ng,4. Ridge trace for -key output variables, full sample, showing extended range for K-valueo 
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Figures 5 and 6 show the ridge trace for subsample 50(a), which was a M d I f f Icult" 
sample for all three estimators u r < J * ihp y f yu:, *soe table 5), Note that In this 
Instance coefficient stability Is achieved with the Introduction of somewhat more bias, 
roughly .015 or so. Note that the value of the coefficient on FT continues to Increase 
all the way out to k s .50. Yet, the harmonic mean formulation suggests that k be set at 
.036, an order of magnitude less than Indicated for the full sample (N=719). For the 
other very small subsample, 50(b), the harmonic mean formulation suggests that k be set at 
.046. The appropriate choice for k, then, as noted earlier. Is entirely sample specific, 
and a function of a particular method of selection as well. Not the stuff. In other 
words, likely to Impress the purist. On the other hand. It does seem In looking at the 
ridge traces that any amount of bias, within some range of k>0, would be a better choice 
than staying with 0LS, assuming that the stability of particular coefficients was of 
greater concern than maximizing goodness of fit with respect to the predicted value of the 
dependent variable. 

Discussion 

The purpose of this study was to assess whether, In the face of extreme 
mul tlcol I Inear Ity In estimating cost functions, the ridge procedure might be a useful 
alternative to the conventional least squares estimator. Utility will depend, of course, 
on perspective and need. As the problem was structured In the present study, the ridge 
procedure appeared to offer several modest advantages. The task was to estimate marginal 
costs for a multlproduct enterprise. Thus ridge Improvements In the precision and 
stability of estimated coefficients were Important — marginal cost estimates being a 
function of a set of coefficients. Similarly, related matters which are often of concern 
In estimating cost functions but not pursued In the present study, such as economies of 
scale and economies of scope, also depend on the value of estimated coefficients. 
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Ridge Trace 
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If, on the other hand, the objective was to predict total costs with the least amount 
of error, then OLS has one immediate advantage. For any given sample, OLS will always 
provMe the lowest possible residual sum of squares. Recall, though, that even when 
. - K Ing total costs, the ridge procedure has a potential advantage when col linearity is 
high* If the coefficients estimated for one sample are to be used to predict total costs 
for the observations In another sample, the R 2 shrinkage Incurred by a ridge estimator Is 
I Ike I y to be less than that for the OLS estimator; the resultant, shrunken R 2 values for 
ridge estimators, then, may be higher than that for the corresponding OLS estimator. 

There is another matter of perspective to consider, other than the specific aims of a 
cost estimation procedure. Roughly speaking, one might describe It as the difference 
between a theoretical versus a practical perspective. On the basis of reviewing the 
theoretically oriented literature, It appears as though there are serious, unresolved 
problems with the ridge procedure (the best summary of these problems is In Judge et al 
19C0). One might describe It slmpliy as a situation In which the advantages offered by 
ridge are possible, but cannot be guaranteed theoretical ly. Furthermore, the failure to 
date to develop a theoretically unimpeachable way of assigning a value to the bias 
parameter has weakened the case for ridge. 

Locked at practically, however, the ridge procedure does seem to offer hope in the 
battle against multicol I Inearlty. In every comparison conducted for the present study, 
ridge was In some pertinent sense superior to OLS. All In all, it appears that the 
marginal cost estimates generated by ridge were less risky than those generated by OLS. 
From a practical perspective that may be enough to Justify using the ridge procedure. 

Final ly, It should be apparent from this study that at the very le?»st ridge 
regression provides a means for data and model exploration. By comparing OLS and ridge 
estimates, and especially by examining ridge traces and VIFs, the analyst can come to a 
better understanding of the effects of multicol I inearlty In a given situation. This Is 
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true whether one op+s for a reduced form model, that Is, elects to eliminate some of the 
col linear Independent variables, or chooses to stay with a theory-driven model regardless 
of the attendant estimation problems, as was done In this study. 
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Appendix It Properties of Ridge Estimators 



The ridge estimator Is a linear transformation of the least squares estimator, which Is 
Just 

xf- wx)" x'y 

(12) 

Rearranging terms we obtain 

Substituting In equation (6), we obtain (14) 
For k > 0, B(k) Is the ridge estimator. 

The relationship of the rldge estimator of the OLS estimator Is then given by 
so that B(k) may be viewed as a linear transform of B. 



(15) 



(16) 



If the squared length of the regression vector B Is fixed at B 2 , then B(k) Is the value of 
B that gives a minimum sums of squares of residuals. This Is Illustrated In Figure 7 for 
a two parameter problem by Marquadt and Snee (1975, p. 5) as follows: 

The point B at the center of the ellipses Is the least squares solution. r B the 
sum of squares of residuals achieves Its minimum value* The small elapse Is the 
locus of points In the B x , a plane where the sum of squares $ Is constant at a value 
larger than the minimum value. The circle abouj the origin tangent to the smal I 
ellipse at B(k). Note that the rldpe estimate B(k) Is the shortest vector that wll I 
give a residual sum of squares as small' as the $> value anywhere on the small 
ellipse. Thus the rldge estimate gives the smallest regression coefficients 
consistent with a given degree of Increase In the residual sum of squares. 

Other -key properties of B(k) Include: 
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(17) 



The length of B(k) Is a decreasing function of 
The variance term Is a decreasing function of k. That Is, 

= o~ 2 2 (X'XY'Z 

The bias term Is an Increasing function of k. That Is, 

SSZ = £r [V£SU)1J * -J)' (2-1 ) /3 

l/ir/o/»ce / (Gigs) 

where ESD denotes the expected squared distance to B. 

This last property points out that the mean square error of B(k) Is composed of two 
components: (1) the sum of variances of all the estimated coefficients; and (2) the 
square of the bias Introduced by substituting B(k) for B. 
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Appendix 1 1 

The algorithms reported In Table 1 require that (X»X) has been transformed to the space of 
orthogonal predictor variables* In this form, the mcxlel expressed In equation (1) becomes 

y« X*« * e (19) 

where X=X*P, <* =pb, pip»ppt»I, pi (X'X)P=A. andAdenotes the diagonal matrix of 
eigenvalues of (X'X). 
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Table 1 . Closed Form Methods for Selecting k 
1. Harmonic Mean (Hoerl, Kennard, and Baldwin, 1975) 

2r tmplrlcal Bayes (Lawless and Wang, 1976). 

3* Iterative Estimation (Hoerl and Kennard, 1970). 

4* Variance Normalization (Bui cock, Lee, and Luck, 1977). 

5* Minimization of the Frequen+lst Expectation of the MSE (Dempster, Schatzoff, and 
Wermuth, 1977). 

6. RIOGM Bayeslan Approach (Dempster, Schatzoff, and Wermuth, 1977). 

7. Generalized Ridge (Hocking, Speed, and Lynn, 1976). 
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Appendix III: Ridge Trace 
In linear estimation on© postulates a model of the form 

y - xS * e 



(20) 



It follows from equation (20) that the residual sums of squares can be written as 

§ = (y-xe)' ( y-xe) 
- (y-ygyix-xi) f (a-i)' X'x (e-2) 



(21) 



The Ridge Trace can be shown to be following a path through the sums of squares surface so 
that for a fixed $ a single B Is chosen which Is of minimum length. This can be stated 
precisely as follows: Minimize B*B subject to, 

ffi -#)'(*•*) (s-3) - o 

(22) 

This Is graphically Illustrated In Appendix I, Figure 7. 
As a Lagranglan problem this Is 

F= /3'B / (%) £ (x'x) (S'S)- £ J 

(23) 

where (1/k) Is the multiplier. Then, 
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Jtf (24) 
Equation (23) reduces to 

(25) 

The value of k Is then chosen to satisfy the restraint Imposed by equation (22). This Is 
the ridge estimator. In practice It Is easier to choose k>=0 and then compute?. 



48 



ERIC 



Appendix IV: Harmonic Mean Approach 

The approach derives fron two assumptions. First, If XW-I, then a minimum mean square 
error term Is obtained If (Hoerl and Kennard, 1970) 

(26) 

Secondly, the general form of equation (18) Is rewritten as 

C X'X y P'A PT' SU) - K'Y 

(27) 

where PkP-Klp. A minimum mean square error must be obtained when (Hoerl and Kennard, 
1970, p. 63) 

Ai = <r* Mf 

(28) 

Hoerl, Kennard and Baldwin (1975) argue that If the K^ are to be combined to obtain a 
single value of k, one would not want to use the arithmetic mean since very small oc- v with 
no predictive power would yield very large values for k* They suggest that a more 
reasonable approach of averaging the k' t Is to employ the harmonic mean. That Is, 
calculate 

The value of k Is then given by 
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The results represented by equations (29) and (30) Indicate that a reasonable choice for 
an automatic selection of k^ls an estimate of (pc*/B'B). And that is what Is used 
vis-a-vis 

/ •■<' a. / f 

A - p <r /<3 /3. 

(31) 
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